Models and model output

Functions such as lm, glm, lmer, glmmTMB … allow us to fit models

e.g. fit french fry rating with respect to all designed main effects:

ff_long <- french_fries |> pivot_longer(potato:painty, names_to = "type", values_to = "rating")
ff_lm <- lm(rating~type+treatment+time+subject, 
data=ff_long)

summary, str, resid, fitted, coef, … allow us to extract different parts of a model for a linear model. Other model functions work differently … major headaches!

summary(ff_lm)

Call:
lm(formula = rating ~ type + treatment + time + subject, data = ff_long)

Residuals:
   Min     1Q Median     3Q    Max 
-7.073 -1.967 -0.464  1.520 10.219 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)   1.1985     0.2579    4.65  3.5e-06 ***
typegrassy   -1.1551     0.1557   -7.42  1.5e-13 ***
typepainty    0.7005     0.1558    4.50  7.1e-06 ***
typepotato    5.1332     0.1557   32.96  < 2e-16 ***
typerancid    2.0329     0.1557   13.06  < 2e-16 ***
treatment2   -0.0513     0.1205   -0.43    0.670    
treatment3   -0.0455     0.1206   -0.38    0.706    
time2         0.1019     0.2161    0.47    0.637    
time3        -0.0136     0.2161   -0.06    0.950    
time4        -0.1025     0.2161   -0.47    0.635    
time5        -0.1330     0.2169   -0.61    0.540    
time6        -0.0461     0.2161   -0.21    0.831    
time7        -0.2462     0.2163   -1.14    0.255    
time8        -0.1163     0.2166   -0.54    0.591    
time9         0.1319     0.2278    0.58    0.563    
time10        0.5463     0.2278    2.40    0.017 *  
subject10     1.7145     0.2439    7.03  2.5e-12 ***
subject15    -0.3591     0.2449   -1.47    0.143    
subject16     0.4752     0.2441    1.95    0.052 .  
subject19     2.0165     0.2439    8.27  < 2e-16 ***
subject31     1.4928     0.2510    5.95  3.0e-09 ***
subject51     1.8728     0.2439    7.68  2.1e-14 ***
subject52     0.1948     0.2439    0.80    0.424    
subject63     0.9605     0.2439    3.94  8.4e-05 ***
subject78    -0.5828     0.2439   -2.39    0.017 *  
subject79    -0.5370     0.2503   -2.15    0.032 *  
subject86     0.4354     0.2510    1.73    0.083 .  
---
Signif. codes:  
0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 2.9 on 3444 degrees of freedom
  (9 observations deleted due to missingness)
Multiple R-squared:   0.4,  Adjusted R-squared:  0.395 
F-statistic: 88.1 on 26 and 3444 DF,  p-value: <2e-16

Tidying model output

The package broom (broom.mixed for glmmTMB) gets model results into a tidy format at different levels

  • model level: broom::glance (values such as AIC, BIC, model fit, …)
  • coefficients in the model: broom::tidy (estimate, confidence interval, significance level, …)
  • observation level: broom::augment (fitted values, residuals, predictions, influence, …)

Goodness of fit statistics: glance

glance(ff_lm)
# A tibble: 1 × 12
  r.squared adj.r.squared sigma statistic p.value    df
      <dbl>         <dbl> <dbl>     <dbl>   <dbl> <dbl>
1     0.400         0.395  2.90      88.1       0    26
# ℹ 6 more variables: logLik <dbl>, AIC <dbl>, BIC <dbl>,
#   deviance <dbl>, df.residual <int>, nobs <int>

Model estimates: tidy

ff_lm_tidy <- tidy(ff_lm)
head(ff_lm_tidy)
# A tibble: 6 × 5
  term        estimate std.error statistic   p.value
  <chr>          <dbl>     <dbl>     <dbl>     <dbl>
1 (Intercept)   1.20       0.258     4.65  3.50e-  6
2 typegrassy   -1.16       0.156    -7.42  1.49e- 13
3 typepainty    0.701      0.156     4.50  7.11e-  6
4 typepotato    5.13       0.156    33.0   2.28e-207
5 typerancid    2.03       0.156    13.1   4.69e- 38
6 treatment2   -0.0513     0.121    -0.426 6.70e-  1

Model diagnostics: augment

ff_lm_all <- augment(ff_lm)
glimpse(ff_lm_all)
Rows: 3,471
Columns: 12
$ .rownames  <chr> "1", "2", "3", "4", "5", "6", "7", "8",…
$ rating     <dbl> 2.9, 0.0, 0.0, 0.0, 5.5, 14.0, 0.0, 0.0…
$ type       <chr> "potato", "buttery", "grassy", "rancid"…
$ treatment  <fct> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
$ time       <fct> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
$ subject    <fct> 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 10, 10, 1…
$ .fitted    <dbl> 6.332, 1.199, 0.043, 3.231, 1.899, 6.33…
$ .resid     <dbl> -3.432, -1.199, -0.043, -3.231, 3.601, …
$ .hat       <dbl> 0.0079, 0.0079, 0.0079, 0.0079, 0.0079,…
$ .sigma     <dbl> 2.9, 2.9, 2.9, 2.9, 2.9, 2.9, 2.9, 2.9,…
$ .cooksd    <dbl> 4.2e-04, 5.1e-05, 6.7e-08, 3.7e-04, 4.6…
$ .std.resid <dbl> -1.188, -0.415, -0.015, -1.119, 1.247, …

Residual plot

ggplot(ff_lm_all, aes(x=.fitted, y=.resid)) + geom_point()

Adding random effects

Add random intercepts for each subject

fries_lmer <- lmer(rating~type+treatment+time + ( 1 | subject ), 
data = ff_long)

Your turn

  • Run the code of the examples so far

The broom.mixed package provides similar functionality to mixed effects models as broom does for fixed effects models

  • Try out the functionality of broom.mixed on each level: model, parameters, and data
  • Augment the ff_long data with the model diagnostics
  • Plot the residuals from the random effects model
  • Plot fitted values against observed values

05:00

Statistical models

Computational models

Forecasting

Resources